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Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach 
to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process 
involved in the identification and cloning of human disease genes. However, the integrity of the data and the 
pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are 
used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediai«fr^uencv • classes 
comprise as much as 50-65% of the total mRNA mass, but represent no more than I0O0-20OO different 
mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become 
overwhelming relatively early in any such random gene discovery programs, thus seriously compromising 
their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to 
construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (IN1B) and 
fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, 
respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, 
we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of 
which have been contributed to our Integrated Molecular Analysis of Genomes and Their Expression 
(IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to 
facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach 
designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and 
(mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed Here we present a 
detailed description and a comparative analysis of four methods that we developed and used to generate 
normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma manson, [\). In 
addition, we describe the construction and preliminary characterization of a "yj^^p 1 ^ 
(1NFLS-S1) that resulted from the elimination (or reduction of representation) of -5000 1NFLS-IMAGE clones 
from the INFLS library. 



Large-scale single-pass sequencing of cDNA 
clones randomly picked from libraries has proven 
to be a powerful approach to discover genes (Ad- 
ams et al. 1991, 1993a,b, 1995; Khan et al. 1992; 
McCombie et al. 1992; Okubo et al. 1992; Mat- 
subara and Okubo 1993; see also Hillier et al., this 
issue). However, the significance of using cDNA 
libraries that are well suited for this purpose 
should not be underestimated (Adams et al. 
1993b). 

Ordinary cDNA libraries may contain a high 
frequency of undesirable ("junky") clones (Ad- 
ams et al. 1991, 1992) that may not only drasti- 



4 Corre«pondlng author. 

E-MAIL cue eutcfa.eec.tolumbla.edu; FAX (212) 781-357/. 



cally impair the overall efficiency of the ap- 
proach, but also seriously compromise the integ- 
rity of the data that are generated. Among such 
junky clones are: (1) clones that consist exclu- 
sively of poly(A) tails of mRNAs; (2) clones that 
contain very short cDNA inserts; (3) clones that 
contain nothing but the 3' half of the Notl- 
oligo(dT) 18 primer used for synthesis of first- 
strand cDNA ligated to an adaptor; and (4) chi- 
meric clones, i.e., cDNAs derived from different 
mRNAs joined artifactually during ligation . Fur- 
thermore, given that, as a general rule, the fre- 
quency of occurrence of a cDNA clone In a library 
is equivalent to that of its corresponding mRNA 
in the cell, even high-quality cDNA libraries may 
not be ideal for large-scale sequencing. 
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Reassociation-kinetics analysis indicates that 
the mRNAs of a typical somatic cell are distrib- 
uted in three frequency classes: (1) superpreva- 
lent (consisting of about 10-15 mRNAs that alto- 
gether represent 10-20% of the total mRNA 
mass); (2) intermediate (1000-2000 mRNAs; 40- 
45%); and (3) complex (15,000-20,000 mRNAs; 
40-45%) (Bishop et al. 1974; Davidson and Brit- 
ten 1979). Accordingly, once most mRNAs of the 
prevalent and intermediate frequency classes are 
identified, redundancy levels are expected to be- 
come greater than 60%. For this reason, the use 
of normalized libraries, in which the frequency of 
all clones is within a narrow range (Soares et al. 
1994), has been shown to be beneficial for large- 
scale sequencing (Berry et al. 1995; Houlgatte et 
al. 1995). Calculations show that at C 0 t = 5.5 
[where C 0 is the total DNA concentration and t is 
the time (moles nucleotides per liter x sec)], of 
the three kinetic classes of mRNAs, the most 
abundant species are diminished drastically, 
while all frequencies are brought within the 
range of one order of magnitude (Soares et al. 
1994). 

However, because a large fraction of all hu- 
man genes has been identified already, redun- 
dant identification of genes that are expressed in 
multiple tissues cannot be avoided simply by the 
use of normalized libraries. Hence, we argue that 
the use of subtractive cDNA libraries enriched for 
genes expressed at low levels and that have not 
yet been identified should become increasingly 
more advantageous for large-scale sequencing 
programs. 

While attempting to improve the representa- 
tion of the longest cDNAs in our libraries, we 
developed three methods for construction of nor- 
malized libraries, in addition to the procedure 
that we described previously (Soares et al. 1994), 
and used them successfully to generate normal- 
ized cDNA libraries from human (15), mouse (3), 
rat (2), and Schistosoma mansoni (1) tissues. All 
human and mouse cDNA libraries have been con- 
tributed to the Integrated Molecular Analysis of 
Genomes and Their Expression (IMAGE) Consor- 
tium (Lennon et al. 1996), and to date a total of 
315,408 expressed sequence tags (ESTs) have 
been derived from these libraries (dbEST release 
052396; http://www.ncbi.nlm.nih.gov). 

Here we present a detailed description and a 
comparative analysis of the four methods that we 
have developed to normalize cDNA libraries; we 
describe a simple procedure for the construction 
of subtractive cDNA libraries; and we discuss 



strategies that take advantage of subtractive hy- 
bridization to expedite the ongoing IMAGE/ 
Washington University/Merck gene discovery 
program. 

RESULTS 

While attempting to improve the representation 
of the longest cDNAs in our normalized libraries, 
we developed four methods and constructed over 
35 libraries, most of which are described here. A 
list comprising 15 human, three mouse, two rat, 
and one schistosome library with their respective 
names, number of recombinants, sequence tags, 
and methods used for normalization and prepa- 
ration of single-stranded plasmids is shown in 
Table 1. 

Extensive characterization of two normalized 
libraries [normalized infant brain (1NIB) and nor- 
malized fetal spleen (1NFLS)] constructed accord- 
ing to our previously described procedure (Soares 
et al. 1994; here designated as method 1) con- 
firmed our original observations that a great ex- 
tent of normalization can be achieved with this 
method for most cDNA species (e.g., cf. lanes 
9,10 in Fig. 1M-P). It is noteworthy that the fre- 
quency of cDNA 122 (used as the probe in P) was 
increased with normalization from <0.0006% in 
the starting library to 0.007% in the 1NIB library 
(Soares et al. 1994). However, Southern hybrid- 
ization of starting and normalized libraries with a 
battery of cDNA probes revealed that on occasion 
truncated clones were favored over their longest 
counterparts during the process. This was first ob- 
served when Southern blots of Norl + HindlU- 
digested plasmid DNA from starting and normal- 
ized infant brain libraries were hybridized with a 
cDNA probe for mitochondrial 16S rRNA (see Fig 
1L, lanes 9,10). Not only was the frequency of 
these mitochondrial cDNA clones not reduced ef- 
fectively during the process of normalization (fre- 
quency of occurrence in starting and normalized 
infant brain libraries was 1.4% and 1.0%, respec- 
tively), but also the length of the hybridizing cD- 
NAs was noticeably smaller in the normalized li- 
brary-Comparative sequence analysis (not 
shown) of a number of hybridizing mitochon- 
drial 16S rRNA clones from both starting and nor- 
malized libraries revealed that whereas the 3' end 
of most cDNAs derived from the starting library 
corresponded to the bona fide 3' end of the 16S 
rRNA, the 3' end of the majority of the cDNAs 
isolated from the normalized library corre- 
sponded to sequences further upstream on the 



792 GENOME RESEARCH 



cDNA'BASED APPROACHES TO FACILITATE CENE DISCOVERY 



Table 1. C mplete List and Main Features of the N rmalized Human, Mouse, Rat, and 
Schist some cDNA Libraries 



mRNA source 



Normalized 
library name 



Number of 
recombinants 
in the 
normalized 
library 



Preparation 
of single- 
stranded 
plasmids 



Method of 
normal- 
ization 



Library 
tag a 



Human infant brain b 
Human fetal liver spleen c 



Human term placenta 

Human 8-9W placenta 

Human breast d 

Human adult brain f 

Human retina h 

Human pineal gland' 

Human ovary tumor' 

Human melanocytes' 1 

Human fetal heart 1 

Human parathyroid adenoma" 1 

Human senescent figroblast" 

Human multiple sclerosis plaques 0 

Human fetal lung' 

19.5-dpc mouse embryo p 

1 7.5-dpc mouse embryo" 

1 3.5- to 14.5-dpc mouse embryos 0 

Rat heart q 

Rat kidney" 1 

8-week-old adult schistosome' 



1 NIB 2,500,000 in vivo 1 

Nb2HFLS20W (1 NFLS) 1 9,000,000 in vivo 1 

5Nb2HFLS20W 3,200,000 in vitro 2-1 

6Nb2HFLS20W 1,400,000 in vitro 2-3 

14Nb2HFLS20W 3,200,000 in vitro 4 

15Nb2HFLS20W -35,000,000 in vitro 2-2 

Nb2HP 750,000 in vivo 2-1 

2NbHP8-9W 100,000 in vitro 2-3 

2NbHbst-3NbHBst e 2,090,000 in vivo 2-1 

N2b4HB55Y-N2b5HB55Y g 3,170,000 in vivo 2-1 

2N2b4HR-N2b5HR 1,600,000 in vivo 2-1 

3NbHPC 1,000,000 in vitro 2-1 

NbHOT 1,100,000 in vivo 2-1 

2NbHM 6,800,000 in vitro 2-3 

NbHH19W 9,700,000 in vitro 4 

NbHPA 3,400,000 in vitro 4 

NbHSF 9,900,000 in vitro 4 

2NbHMSP 1,100,000 in vitro 3 

NbHLl 9W 21,700,000 in vitro 4 

p3NMF19.5 3,400,000 in vitro 4 

NbME1 7.5 6,800,000 in vitro 4 

NbME1 3.5-1 4.5 380,000 in vitro 4 

NbRH 400,000 in vitro 4 

2NbRK 1 30,000 in vitro 4 

NbS8W 1,000,000 in vitro 4 



ACCAA 
AGATCT 



ACCAA 

CA 

CC 

GC 

AC 

CC 

CG 

AG 

ATC 

ACCAA 

AACCA 

CA 

AA 

ACAAC 
CACAC 
GGAAA 
ACAAC 
CAAAC 
CAAAG 



With the exception of 1 NIB, which was constructed in the Lafmid BA vector, all libraries were constructed in the pT7T3-Poc vector. Cloning sites were 
WnH and fmRI exceot for fetal liver spleen (Pad and EcoRI) and infant brain (Notl and Hind\\\). 

^^^V^SJ^U^ present in the oligonucleotide used to prime the synthesis of first-strand cDNA, between 
^quenceTor the rare restriction enzyme (Not. or Pod in the case of the liver spleen library) used for d.rectional do»ng.nd he dT 
In the human parathyroid adenoma, senescent fibroblast, mouse embryo, rat, and Schtstosoma manson, hbranes) located at the I end I the pnmer 
"■Human infant brain (kindly provided by Dr. Conrad Gilliam, Columbia University, New York, NY) was from a 72-day-old female who died ,n 

Sn fetal i^ e ™S^™&* by Or. Stephen Brown, Columbia Presbyterian Medical Center, New York, NY) was from a 20-week-o.d 

^^^W?^'lrom normal breast pooled from reduction mammoolasty tissue was kindly provided by Dr. Anne Bowcock and Ms. 

Monique Spillman, University of Texas Southwestern Medical Center at Dallas. 

«2NbHbst differs from 3HbHbst in the C 0 t used for hybridization (237 and 20, respectively). 

*^u^7£* RNA (kindly provided by Dr" Donald Cilden. University of Colorado Health Sciences ( Center • ^™^ n ?^ 
55-year-old male who died of a ruptured aortic aneurysm. Brain tissue (frontal, panetal, temporal and occipital cortex from the left andnght 
Spheres, subcortical white matter, basal ganglia, thalamus, cerebellum, midbrain, pons, and medulla was acqu.ed 17-18 hr after dea ft 
9N2b4HB55Y and 2N2b4HR differ from N2b5HB55Y and N2b5HR, respectively, in the average size of their cDNA .nserts (1 .5-2.5 kb and 0.4-1 .5 w>. 

^cen'ular normal human retina RNA (kindly provided by Dr. Roderick R. Mclnnes, University of Toronto and Hospital for Sick Children, Canada) 

tanSalTnd fS^vided'byDr. otid Klein, National Institute of Child Heal* and Human Development National Institutes of Heal* 
mJ!^ta?m of three pineal glands (gland 1: 48-year-old Caucasian male: gland 2: 18-year-old Caucasian female, gland 3. 

^1™™JZ£^*HA was kindly provided by Dr. Anne Bowcock and Ms. Monique ^^^^J^S^ 
Medical School. It was obtained from a 36-year-old Caucasian with a papillary serous cystadenocarcmoma grade III w.th surface extensions and 

Secular human melanocyte RNA (kindly provided by Dr. Anthony Albino and Dr. Alice de Oliveira, Memorial Sloan-Kettering Cancer Center, 

StmSaf ne^rdTungTind^ided by Dr. Stephen Brown. Columbia Presbyterian Medical Center) were derived from the same 

"t^S^S^ provided by Dr. Stephen Marx. National Institute of Diabetes and Digestive and Kidney Diseases, NIH) was 

normal human fibrob.asts was kindiy provided by Dr. Barbara Burhart (National 
Sconces M^The cells were prepared by passaging normal human fibroblasts derived from foreskin until they exhibited an enlarged, flattened 

Sc n e e |lu P tarR n NA for construction of the mouse (C57BL/6) strain) embryonic libraries was kindly provided by Dr. Minoru Ko (Wayne State University, 
?R« fesu^were obtained from an adult Zivic-Miller Sprague Dawley female and were kindly provided by Dr. Stephen Brown (Columbia Presbyterian 
T 0 *e£"NA from mature 8-week-old Sch/stosomo monsoni worms was kindly provided by Dr. Ron Blanton, Case Western Reserve University, 
Cleveland, OH. 
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Figure 1 Comparative analysis of starting and normal- 
ized cDNA libraries by Southern hybridization with 14 
cDNA probes. The 0.015 |xg Pod + fcoRI digested plas- 
mid DNA from the starting fetal liver/spleen library (lane 
6), from the normalized fetal liver/spleen libraries con- 
structed according to method 2-1 (lane 7), method 2-3 
(lane 2), method 2-2 (lane 3), method 1 (lane 4), method 
4 (lane 5), and from the liver/spleen mini-libraries en- 
riched for abundant cDNAs (HAP-bound fractions) gener- 
ated with method 2-1 (lane 7) and method 4 (lane 8) 
were electrophoresed on 1% agarose gels, transferred to 
nylon membranes (CeneScreenPlus; DuPont/NEN) and 
hybridized at 42°C in 50% formamide, 5x Denhardt's 
solution, 0.75 m NaCI, 0.1 5 m Tris (pH 7.5), 0.1 m sodium 
phosphate, 0.1% sodium pyrophosphate, 2% SDS con- 
taining sheared and denatured salmon sperm DNA at 1 00 
jig/ml. Similarly, 0.05 \xg Nofl + Hind\\\ digested plasmid 
DNA from the starting (IB; lane 9) and normalized (1 NIB; 
lane 10; method 1) infant brain libraries (Soares et al. 
1994) were electrophoresed, transferred, and hybridized 
as described above. Radioactive probes were prepared by 
random primed synthesis using the Prime-it II kit (Strata- 
gene). The following probes were used: a-globin (A), 

0-globin (B), 7-globin (Q, serum albumin (D, shorter exposure; £, longer exposure), acidic ribosomal phospho- 
protein PO (F), HI 9 RNA (C, shorter exposure; H, longer exposure), apolipoprotein A (0, angiotensinogen (/), 
unknown cDNA 8 (K), mitochondrial 1 6S rRNA (/.), a-Tubulin (M), myelin basic protein (N), secretogranin (O), 
and unknown cDNA 122 (P). All probes were contaminated intentionally with a small amount of vector DNA to 
enable visualization of vector bands and thus confirm that a similar amount of library DNA was loaded in all lanes. 
(V) vector band, which is released from the cDNA inserts by double digestion with the restriction enzymes 
specified above. 




16S rRNA. The occurrence of such 3' truncations 
was also documented by sequence analysis (not 
shown) for serum albumin cDNAs in the fetal 
liver/spleen library (see Fig. 1D,E, lanes 4,6). 

Reasoning that this problem could be cir- 
cumvented if the fragments used in the hybrid- 
ization with the single-stranded circles (1) were 
in excess, and (2) spanned the entire length of 
the cDNAs, we developed an alternative proce- 
dure to normalize cDNA libraries based on hy- 
bridization of in vitro synthesized RNA (driver) 
from an entire library with the library itself in the 
form of single-stranded circles (tracer) (see meth- 
ods 2-1 and 2-2 in Fig. 2). Several normalized li- 
braries were generated by this procedure (see 
Table 1). 

Southern hybridization of endonuclease- 
restricted plasmid DNA from starting and nor- 
malized libraries with a number of cDNA probes 
(Fig. 1) indicated clearly that these methods ef- 
fectively improved the representation of the 
longest cDNAs in the normalized libraries (e.g., 
cf. lanes 1,4 in Fig. 1A,D,E,G,H). However, charac- 
terization of one of these libraries (5Nb2HFLS20W) 



by colony hybridization with cDNA probes (not 
shown) indicated that this approach was effective 
to reduce the frequency of some, but not all, of 
the most abundant clones (e.g., serum albumin 
was reduced about 20-fold, whereas -y-globin was 
reduced only twofold). No difference was ob- 
served when hybridizations were performed at 
different conditions [0.4 m NaCI and 50% for- 
mamide at 42°C as in methods 2-1 and 2-3; 0.12 
m NaCI, 50% formamide, and 1% sodium dodecyl 
sulfate (SDS) at 30°C as in method 2-2 (see lane 3 
in Fig. 1); 0.4 m NaCI and 80% formamide at 
42°C, not shown]. 

It is noteworthy that Northern hybridization 
(not shown) of in vitro transcribed RNA synthe- 
sized from an entire plasmid library with probes 
derived from the abundant cDNAs that failed to 
be normalized effectively by this procedure (e.g., 
globins in the fetal liver/spleen library and glyc- 
eraldehyde-3-phosphate dehydrogenase (G3PD) 
in the breast library) indicated that they were not 
as prevalent in the population of in vitro tran- 
scribed RNAs as they were in their respective 
starting cDNA libraries. 
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single-stranded circles are purified by HAP chromatography, 
ated into DH10B (Life Technologies), and propagated under 
malized library (method 2-3). 



Figure 2 Diagram of the normalization 
methods 2-1, 2-2, and 2-3. Double- 
ii stranded plasmid DNA representing an en- 
tire starting library is (1) linearized with ei- 
ther Sfi\, Not\, or Pad and used as template 
for synthesis of RNA in vitro using T3 or 17 
RNA polymerases, and (2) converted to 
single-stranded circles either in vivo, upon 
electroporation into DHSaF' and superin- 
fection with M13K07, or in vitro by the 
combined action of Gene II and Exonucle- 
ase III (Life Technologies). Single-stranded 
plasmid DNA is HAP-purified and hybrid- 
ized (C 0 t ~ 5) with excess RNA (pretreated 
with RNase-free DNAse I; Promega), 
blocked with appropriate oligonucleotides 
to prevent hybridization through common 
vector sequences (see Methods section). 
Both the fraction that remains single- 
stranded (flow-through) as well as the re- 
sulting hybrids (bound) are purified by HAP 
chromatography. The HAP flow-through 
fraction is converted to double-stranded 
plasmids, electroporated into DH10B bac- 
teria (Life Technologies), and propagated 
under ampicillin selection to generate an 
amplified normalized library (methods 2-1 
and 2-2, depending on the conditions used 
for hybridization; see Methods section). 
The HAP-bound fraction is also converted 
similarly to double-stranded plasmids, elec- 
troporated into bacteria, and propagated 
under ampicillin selection to generate a 
mini-library enriched for abundant cDNAs. 
Double-stranded plasmid DNA from this 
mini-library is linearized and used as tem- 
plate for synthesis of RNA in vitro. After di- 
gestion of the plasmid DNA template with 
ribonuclease-free DNAse I (Promega), the 
RNA (driver) is blocked with appropriate 
oligonucleotides and hybridized (C 0 t 
-100-200) with HAP-purified single- 
stranded plasmids derived from the start- 
ing library (see above). The remaining 
converted to double-stranded circles, electropor- 
ampicillin selection to generate an amplified nor- 



A significantly improved extent of normal- 
ization was achieved when runoff RNA synthe- 
sized from the plasmid mini-library enriched for 
abundant cDNAs (hydroxyapatite (HAP)-bound 
fraction of method 2-1 in Fig. 2) was hybridized 
(C 0 t = 100-200) with single-stranded circles from 
the starting library (see method 2-3 in Fig. 2 and 
Table 1; cf. lanes 1,2 in Fig. 1A-D,F,G). 

In an effort to preserve the positive charac- 



teristics of both methods 1 and 2 (i.e., the ad- 
equate extent of normalization achieved with 
method 1, and the improved representation of 
the longest cDNAs achieved with method 2), we 
developed two additional reassociation kinetics 
based procedures involving DNA-DNA hybrid- 
ization (methods 3 and 4; see Fig. 3). 

Method 3, which was successfully used to 
construct a normalized library from multiple 
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sclerosis plaques (see 2NbHMSP in Table 1), in- 
volved hybridization of a 20-fold excess of single- 
stranded cDNA fragments (comprising the 5' 
halves of all inserts of the starting library, gener- 
ated by Exonuclease III digestion of gel-purified 
double-stranded cDNAs; see Fig. 3) with comple- 
mentary single-stranded circles produced in vitro 
by the combined action of Gene II and Exonucle- 
ase III (Life Technologies). 

Southern hybridization of Notl + EcoRl- 
digested plasmid DNA from the starting and nor- 
malized (with methods 2-1 and 3) multiple scle- 
rosis plaques library with mitochondrial 16S 
rRNA and myelin basic protein cDNA probes (not 
shown) clearly indicated that method 3 was su- 
perior to method 2-1 in that a much greater ex- 
tent of normalization was achieved, at the same 
time that it maintained (similar to method 2-1) 
appropriate representation of the longest cDNAs 
in both cases. 

For the libraries constructed with method 4 
(see Table 1 and Fig. 3), double-stranded cDNA 
inserts generated by the polymerase chain reac- 
tion (PCR) with T3 and T7 primers were melted 
and hybridized (in the presence of vast excess of 
blocking oligonucleotides) with single-stranded 
plasmid library DNA prepared in vitro. 

Southern hybridization of Pad + EcoRl- 
digested plasmid DNA from starting and normal- 
ized (with methods 1, 2-1-2-3, and 4) fetal liver/ 
spleen libraries (Fig. 1) with several cDNA probes 
(including those that revealed incomplete nor- 
malization with methods 2-1-2-3, such as a-glo- 
bin, p-globin and -v-globin) demonstrated the ef- 
ficacy of method 4 in achieving the desired ex- 
tent of normalization obtained with method 1 
(cf. lanes 1-6 in Fig. 1A-D, F-H, and lanes 3-6 in 
Fig. 1I-K) while preserving the representation of 
the longest cDNAs (e.g., the longest albumin 
cDNA was present in the normalized library pre- 
pared with method 4, shown in lane 5 of Fig. 
1D,E, but it was undetectable in the normalized 
library constructed with method 1, shown in 
lane 4; a similarly remarkable difference was re- 
vealed with the cDNA probe for H19 RNA, shown 
in Fig. 1G,H). Characterization of the normalized 
library generated with method 4 by colony hy- 
bridization with 10 cDNA probes (not shown), 
which occur at a wide range of frequencies in the 
starting library, confirmed the effectiveness of 
the procedure to narrow their frequencies down 
to within one order of magnitude (e.g., the fre- 
quencies of the cDNAs for -y-globin, ot-globin, 
p-globin, H19 RNA, and transferrin were reduced 
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Figure 3 Diagram of the normalization methods 
3 and 4. In method 3 double-stranded plasmid DNA 
from a starting library is digested with restriction 
enzymes that generate 5' protruding ends, and 
the excised cDNA inserts are gel-purified from the 
cloning vector and digested with Exonuclease III 
to yield noncomplementary single-stranded frag- 
ments, each representing half of a cDNA insert. 
Note that the single-stranded fragments that 
span the 5' half (but not the 3' half) of the cDNA 
inserts are complementary to single-stranded plas- 
mids prepared in vitro. These single-stranded DNA 
fragments are blocked with appropriate oligo- 
nucleotides (see Methods) and hybridized with 
single-stranded library DNA prepared in vitro 
(middle column). The remaining single-stranded 
circles are HAP-purified, converted to double- 
stranded plasmids, electroporated into DH10B 
bacteria (Life Technologies), and propagated under 
ampicillin selection to generate a normalized li- 
brary. In method 4, single-stranded library DNA is 
used as template for PCR amplification with T3 
and T7 primers. PCR-amplified cDNAs are purified 
from excess primers, melted, and hybridized with 
single-stranded library DNA in the presence of 
blocking oligonucleotides. The remaining single- 
stranded circles are purified by HAP chromatog- 
raphy, converted to double-stranded plasmids, 
electroporated into bacteria, and propagated 
under ampicillin selection to generate a normalized 
library. 
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from 9.2%, 6.4%, 3.6%, 1.8%, and <0.2% to 0.04%, 
0.02%, 0.01%, 0.1% and 0.1%, respectively). 

In order to assess further the ability of these 
normalization procedures to preferentially re- 
duce the representation of the most abundant 
cDNAs, we have performed a comparative se- 
quence analysis (not shown) of 100 clones picked 
randomly from the fetal liver/spleen cDNA li- 
brary normalized with method 4 (14Nb2HFLS20W 
in Table 1; HAP-flow-through fraction in Fig. 3), 
and from two fetal liver/spleen mini-libraries en- 
riched for abundant cDNAs (HAP-bound frac- 
tions in Figs. 2 and 3) obtained during HAP pu- 
rification of the normalized libraries prepared ac- 
cording to methods 2-1 (5Nb2HFLS20W) and 4 
(14Nb2HFLS20W). A number of cDNAs known to 
be prevalent in the starting fetal liver/spleen li- 
brary (e.g., albumin, 7-globin, a-globin, 0-globin, 
mitochondrial RNAs, and apolipoproteins A and 
H) were found at increased frequencies in both 
mini-libraries enriched for abundant cDNAs, but 
none of them was represented in the sample of 
100 clones from the normalized library. It is note- 
worthy that while 47% of the sequences derived 
from the normalized library were not represented 
in the "all nonredundant" subdivision of se- 
quences of GenBank + EMBL + DDBJ + PDB, the 
majority of the sequences obtained from the 
mini-libraries of abundant cDNAs derived from 
methods 2-1 and 4 (91.4% and 86.9%, respec- 
tively) did have homologous sequences in that 
data base. Furthermore, although 49% of the se- 
quences derived from the normalized library had 
fewer than 10 homologous ESTs in the dbEST 
subdivision of GenBank, most of the sequences 
obtained from both mini-libraries had greater 
than 10 homologous ESTs in the dbEST data base 
(92.5% and 89.7%, respectively, in the HAP- 
bound fractions of methods 2-1 and 4). 

With the ultimate goal of facilitating the on- 
going process of gene discovery by large-scale se- 
quencing of cDNA clones picked randomly from 
libraries, we have performed a pilot subtractive 
hybridization experiment to eliminate (or reduce 
representation of) a pool of approximately 5000 
IMAGE Consortium-arrayed cDNA clones (pool 
no. 1, LLAM 78-90) from the normalized library 
from which they were derived (1NFLS in Table 1). 
PCR-amplified cDNA inserts from pool no. 1 were 
melted and hybridized, in the presence of block- 
ing oligonucleotides, with single-stranded plas- 
mid DNA from the 1NFLS library, prepared in 
vitro. The remaining single-stranded circles were 
purified by HAP chromatography, converted to 



double-stranded plasmids, electroporated into 
bacteria, and propagated under antibiotic selec- 
tion to generate the subtracted 1NFLS-S1 library 
(see Fig. 4). Preliminary characterization of the 
1NFLS-S1 library by Southern hybridization with 

10 cDNA probes (only five are shown; see Fig. 5) 
known to be represented in pool no. 1 indicated 
clearly the effectiveness of the procedure to 
eliminate (or to reduce the representation of) all 

11 cDNA sequences in the 1NFLS library. A 
BLASTN search of the dbEST division of GenBank 
(6/12/96) with 3' ESTs obtained from the five 
probes (cDNAs -1, -4, -8, -9, and -10) the hybrid- 
izations of which were not shown in Figure 5, 
revealed the presence of 0, 0, 1, 2, and 2 corre- 
sponding ESTs, respectively, from the 1NFLS li- 
brary, thus indicating that the subtraction was 
successful even for cDNAs that were under- 
represented in the normalized library (a total of 
44,407 3' ESTs have been derived from the 1NFLS 
library to date). It should be noted that because of 
sequencing failures, some of the clones in these 
arrays may not yet have corresponding ESTs in 
the public data bases. 

It is noteworthy that when we attempted to 
perform the same subtractive hybridization ex- 
periment using, as driver, RNA synthesized in 
vitro from a plasmid DNA preparation of pool no. 
1, the results obtained were not satisfactory (not 
shown) in that subtraction could be demon- 
strated for some but not all tested clones (e.g, 
a-globin could not be subtracted effectively), 
similar to what we observed in normalizations 
with method 2-1. 

DISCUSSION 

As a result of an effort to improve the represen- 
tation of the longest cDNAs in our normalized 
libraries, we have developed four different meth- 
ods for normalization of directionally cloned 
cDNA libraries constructed in phagemid vectors, 
while contributing resources to the IMAGE Con- 
sortium (Lennon et al. 1996) and thereby facili- 
tating the ongoing gene discovery and mapping 
programs. Approximately 87.5% of all (human) 
IMAGE ESTs were derived from the normalized 
libraries described here. 

The normalization procedure (method 1) 
that we described previously (Soares et al. 1994) 
was applied for the construction of the 1NIB 
and 1NFLS normalized libraries, from which a 
total of 45,192 and 86,088 ESTs, respectively, 
have been derived (dbEST release 052396; http:// 
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Figure 4 Diagram of the subtractive hybridization 
procedure used to generate the 1NFLS-S1 library. 
Double-stranded plasmid DNA from a pool of 
-5000 IMAGE Consortium-arrayed cDNA clones de- 
rived from the 1 NFLS library (pool no. 1, LLAM 78- 
90) was converted to single-stranded circles in vitro 
by the combined action of Gene II and Exonuclease 
III (Life Technologies). The resulting single-stranded 
plasmids were HAP-purified and used as a template 
for PCR amplification with T3 and T7 primers. PCR- 
amplified cDNA inserts were purified from excess 
primers, melted, and hybridized with single- 
stranded circles (prepared in vitro) from the 1 NFLS 
library, in the presence of appropriate blocking oli- 
gonucleotides. The remaining single-stranded 
circles were purified by HAP chromatography, con- 
verted to double-stranded plasmids, electroporated 
into DH1 OB bacteria (Life Technologies), and propa- 
gated under ampicillin selection to generate the 
(1NFLS-S1) subtracted library. 



www.ncbi.nlm.nih.gov). Data analysis (see 
Hillier et al., this issue) demonstrated solidly the 
efficacy of this approach in bringing the fre- 
quency of all clones to within a narrow range. 
Extensive characterization of these two libraries 
by Southern analysis, however, revealed that on 






Figure 5 Characterization of the 1NFLS-S1 sub- 
tracted liver/spleen library by Southern hybridiza- 
tion with 5 cDNA probes. The 0.15 ^g Poc\ + fcoRI- 
digested plasmid DNA from the fetal liver/spleen 
library normalized with method 1 (1 NFLS; lane J), 
from the pool of -5000 IMAGE Consortium-arrayed 
cDNA clones derived from the 1 NFLS library (pool 
no. 1, LLAM 78-90; lane 2), from the subtracted 
library generated according to the diagram shown 
in Fig. 4 (1NFLS-S1; lane 3), and from the HAP- 
bound fraction obtained during HAP purification of 
the 1NFLS-S1 library (see Fig. 4) were electropho- 
resed, transferred to nylon membranes, and hybrid- 
ized as described in the legend to Fig. 1. The fol- 
lowing cDNA probes were used: a-globin (A), -y-glo- 
bin (8), serum albumin (Q, unknown cDNA 7 (D; 
picked randomly from pool no. 1, LLAM 78-90), and 
unknown cDNA 5 (f; picked randomly from pool 
no. 1, LLAM 78-90). A BLASTN search of the dbEST 
subdivision of Genbank with 3' ESTs derived from 
cDNA 7 and cDNA 5 revealed the presence of 33 
and 0 corresponding ESTs, respectively, from the 
1 NFLS library. All probes were contaminated inten- 
tionally with a small amount of vector DNA to en- 
able visualization of vector bands and thus confirm 
that a similar amount of library DNA was loaded in 
all lanes. (V) vector band; (U) residual undigested 
plasmid. 



occasion truncated clones were favored over their 
longest counterparts during the normalization 
procedure. 

Because of the relatively permissive condi- 
tions used for synthesis of first-strand cDNA, 
priming with the NotI-tag-(dT) 18 oligonucleotide 
may occur not only at the poly(A) tail of the mR- 
NAs but also at internal A-rich sites within the 
mRNAs (e.g., at Alu tails). Typically, cDNAs with 
3' truncations occur at frequencies of 10-15% in 
directionally cloned libraries. Truncated clones 
can be recognized (tentatively) as such, by the 
absence of a bona fide polyadenylation signal 
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sequence at the appropriate distance upstream 
from the oligo(dA) 1K tail of the cDNA. 

Why may truncated cDNAs be favored over 
their longest counterparts during normalization 
by method 1? Briefly, method 1 (Soares et al. 
1994) involves: (1) annealing of a single-stranded 
DNA preparation of a directionally cloned cDNA 
library with an oligo(dT) 18 primer; (2) controlled 
primer extension reactions in the presence of de- 
oxynucleotides and dideoxynucleotides to gener- 
ate 3' noncoding extension products of approxi- 
mately 200-300 nucleotides; (3) purification of 
the resulting partially double-stranded circles by 
HAP chromatography; (4) melting and reassocia- 
tion of the HAP-purified partially double- 
stranded circles to a relatively low C 0 t (5-10); (5) 
purification of the remaining single-stranded 
circles (normalized library) over HAP; (6) conver- 
sion of the single-stranded circles to double- 
stranded circles; and (7) electroporation into bac- 
teria. 

It could be anticipated that during the reas- 
sociation reaction, because truncated cDNAs oc- 
cur at lower frequencies than their nontruncated 
counterparts, the extension products of the trun- 
cated cDNAs would more likely reanneal to the 
nontruncated overlapping cDNAs than to their 
own truncated templates. On the other hand, the 
extension products of the nontruncated cDNAs 
would most likely reassociate to their own non- 
truncated templates not only because they are 
more prevalent but also because of the low prob- 
ability of there being an overlap between the 
short extension product of a nontruncated clone 
and a truncated single-stranded circle. As a result, 
nontruncated single-stranded circles are more 
likely to end up reassociated with more than one 
(nonoverlapping) extension product, whereas 
their truncated counterparts would remain 
single-stranded and therefore end up in the HAP 
flow-through fraction (normalized library). 

Reasoning that this problem could be cir- 
cumvented if the hybridizing fragments (1) were 
in excess over single-stranded circles, and (2) 
spanned the entire length of the cDNAs to maxi- 
mize the opportunity of overlap between trun- 
cated and nontruncated clones, we devised an 
approach (methods 2-1 and 2-2; note that 2-2 is 
the same as 2-1 except that hybridization condi- 
tions were different) whereby in vitro synthe- 
sized RNA from a plasmid DNA preparation of a 
starting library is used as driver in hybridization 
(C 0 t - 5) with the same library in the form of 
single-stranded circles. Indeed, these modifica- 



tions improved successfully the representation of 
the longest cDNAs in the normalized libraries 
(e.g., serum albumin in the liver/spleen libraries). 

However, in every library constructed with 
methods 2-1 and 2-2, we were able to identify 
cDNA clones that seemed to become normalized 
with much greater difficulty than others (e.g., 
a-globin in the 5Nb2HFLS20W liver/spleen li- 
brary, and G3PD in the breast library). We inter- 
preted these results as suggestive that not all 
clones might be transcribed in vitro with the 
same efficiency if in a mixture (i.e., in vitro tran- 
scription of plasmid DNA from an entire library), 
and/or secondary structures in the RNAs (or in- 
teractions between RNAs) might impair their 
ability to hybridize with the single-stranded 
circles. These hypotheses were corroborated by 
the observation (not shown) that relatively weak 
hybridization signals were observed when North- 
ern blots of RNA transcribed in vitro from an en- 
tire plasmid library were hybridized with cDNA 
probes derived from those clones that could not 
be normalized as effectively, despite the fact that 
they occurred at high frequencies in the starting 
libraries from which the in vitro transcribed 
RNAs were synthesized. We did exclude the pos- 
sibility that the clones that were not being nor- 
malized effectively carried deletions that pre- 
vented them from being transcribed appropri- 
ately in vitro (not shown). In fact, all clones that 
were tested individually for in vitro transcription 
yielded the expected amounts of full-length RNA. 
Although this problem was significantly mini- 
mized in method 2-3 (cf. lanes, 1,2 in Fig. 1A- 
D,F,G) the extent of normalization that was 
achieved was still not comparable to that ob- 
tained with method 1 (cf. lanes 2,4 in Fig. 1A- 
D,F,H). 

The advantage of method 2-3 over methods 
2-1 and 2-2 is that the RNA driver is derived from 
a mini-library (of relatively low complexity) en- 
riched for abundant cDNAs rather than from the 
entire starting library. For this reason, higher C 0 t 
hybridizations can be carried out to eliminate or 
reduce significantly the representation of the 
most abundant cDNAs. It should be noted, how- 
ever, that method 2-3 is not a true normalization 
procedure, because the aim of this approach is 
not to equalize the frequency of all cDNA clones 
but rather to reduce significantly (or even to 
eliminate, depending on the C 0 t used) the repre- 
sentation of the most abundant clones. 

The extent to which the enrichment for 
abundant transcripts can be achieved in such 
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mini-libraries depends essentially on the C 0 t used 
for reassociation. Calculations based on estimates 
of frequencies of brain mRNAs (Soares et al. 1994) 
indicate that the best enrichments are obtained 
at a C„t = 5-10. If the C 0 t is too low (*sl) the 
enrichment is only for the most prevalent (class 
1) mRNAs; there is no enrichment for the mRNAs 
of the intermediate frequency class (class II) mR- 
NAs. On the other hand, if the C 0 t is too high 
(>50) the enrichment for class I transcripts starts 
to become less significant because of a higher rep- 
resentation of mRNAs of the complex class (class 
III). Prevalent and intermediate (classes I + II) 
brain mRNAs comprise 93-95% of the total 
cDNA population in a C 0 t = 5-10 HAP-bound 
mini-library, in contrast to 62% in the starting 
library. Consequently, the frequency of class III 
transcripts in a C 0 t = 5-10 HAP-bound mini- 
library is about 5.5-fold lower than that of the 
starting library (5-7% in the bound mini-library 
vs. 38% in the starting library). 

Methods 3 and 4 were developed as a result 
of an attempt to achieve both the adequate ex- 
tent of normalization obtained with method 1 
and the improved representation of the longest 
cDNAs accomplished with methods 2-1, 2-2, and 
2-3. Although more technically cumbersome, 
method 3 is superior to method 4 in that the 
DNA driver used in the hybridization is single- 
stranded. 

Single-stranded driver in method 3 (see Fig. 
3) is generated by Exonuclease III digestion of 
gel-purified double-stranded cDNA inserts ex- 
cised from the starting library. The resulting non- 
complementary single-stranded fragments repre- 
sent the 5' and 3' halves of the original cDNA 
inserts. The fragments that correspond to the 5' 
halves of the cDNAs are complementary to 
single-stranded circles prepared in vitro, whereas 
the single-stranded fragments that correspond to 
the 3' halves of the cDNA inserts are complemen- 
tary to single-stranded plasmids prepared in vivo. 
Note that for the multiple sclerosis plaques li- 
brary constructed with method 3 we used single- 
stranded circles prepared in vitro. 

Production of single-stranded circles in vitro 
by the combined action of Gene II and Exonucle- 
ase III (Life Technologies), rather than in vivo by 
superinfection of a culture with a helper phage, is 
very beneficial because it circumvents the distor- 
tions that otherwise may arise as a result of the 
differential growth properties of clones with dif- 
ferent size inserts. However, because the diges- 
tion with Gene II results in the conversion of 



most, but not all, supercoiled plasmids to relaxed 
circles, it becomes necessary to purify the single- 
stranded circles that are produced after digestion 
with Exonuclease III by HAP chromatography. 

For construction of the normalized multiple 
sclerosis plaques library, the cDNA inserts were 
excised by double digestion of plasmid DNA from 
the starting library with NofI and EcoRl. The fact 
that one in every three clones might have an in- 
ternal £coRI site (an Eco RI site is expected to oc- 
cur once every 4096 bp, and the average insert 
size in these libraries is of the order of 1.4 kb) 
should not compromise the efficiency of the pro- 
cedure, because at least one of the resulting re- 
striction fragments would be expected to be 
>200 bp (clones smaller than 400 bp are size- 
selected out of these libraries) and therefore be 
able to form hybrids that would bind quantita- 
tively to HAP under our conditions. A disadvan- 
tage of method 3, as presented, is that only 
clones <2.9 kb (approximate vector size) can be 
excised cleanly from the vector. It is conceivable, 
however, that one might be able to use double- 
stranded cDNA fragments generated by PCR am- 
plification with T3 and T7 primers as substrate 
for the Exonuclease III digestion in method 3. 

Method 4 was used to generate a significant 
fraction of the libraries that were contributed to 
the IMAGE Consortium (see Table 1). It is un- 
doubtedly the simplest and overall most advan- 
tageous of all procedures. Because the DNA driver 
is generated by PCR amplification of the starting 
(double-stranded or single-stranded, see below) 
plasmid library with T3 and T7 primers, the tracer 
(single-stranded circles) used in this hybridiza- 
tion may be produced in vitro or in vivo. 

The extent of normalization achieved with 
method 4 was comparable to that obtained with 
method 1 with the advantage that it successful- 
ly preserved the representation of the longest 
cDNAs (cf. lanes 4,5 in Fig. 1). Moreover, method 
4 is superior to method 1 because it does not 
preclude the clones derived from mRNAs with 
internal Notl sites from being represented in the 
normalized library. Because the starting material 
for the reassociation kinetics reaction in method 
1 is generated by a controlled primer extension 
reaction with an oligo(dT) 18 primer, clones with- 
out an oligo(dA) 18 tail (derived from mRNAs with 
an internal Notl site) are not represented in the 
final normalized library, although they are not 
necessarily lost (clones without tails end up in 
the HAP flow-through fraction during HAP puri- 
fication of the partially double-stranded circles 
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generated by this primer extension reaction). It 
should also be noted that this problem of method 
1 could be circumvented by the use of an oligo- 
nucleotide complementary to flanking vector se- 
quences [as opposed to the oligo(dT) 18 ] for this 
controlled primer extension reaction. 

The potential biases introduced by PGR am- 
plification in method 4 are minimized by the fact 
that (1) PCR amplification products are used in 
excess in these hybridizations, and (2) the size dis- 
tribution of inserts in these libraries is relatively 
narrow (ranging typically from 0.4 to 2.5 kb). 

The conditions used for hybridization greatly 
influenced the quality of the resulting normal- 
ized libraries constructed with method 4. This is 
to a great extent a consequence of the fact that 
we are using HAP to purify single-stranded circles, 
as opposed to a biotin-avidin capture system, 
which in our hands yielded significantly less satis- 
factory results (M.F. Bonaldo and M.B. Soares, un- 
publ.). The best results were obtained when the hy- 
bridization conditions were the most similar to the 
HAP conditions. We interpreted these results as 
suggestive of the fact that imperfect hybrids 
formed during hybridization may either not bind 
to HAP and/or may melt once in the HAP buffer. 

It is noteworthy that a much superior extent 
of normalization was obtained with method 4 
when single-stranded plasmid DNA prepared in 
vitro, as opposed to double-stranded plasmid 
DNA, was used as template for PCR amplification 
(not shown). These results suggest that a fraction 
of the double-stranded plasmids used as template 
for PCR amplification, presumably in the form of 
melted supercoiled DNA, might end up in the 
HAP flow-through fraction (normalized library) 
during purification. 

It is noteworthy that cross-hybridizing di- 
verged sequences seem to escape normalization 
in all of the procedures discussed above. For ex- 
ample, the frequency of Alu repeat-containing 
cDNAs (typically 10% in directionally cloned 
cDNA libraries) is practically the same in starting 
and normalized libraries. These results suggest 
that imperfect hybrids either do not bind to 
HAP under our conditions or melt once diluted 
in the (more stringent) HAP buffer. This is advan- 
tageous, not only because it preserves the repre- 
sentation of Alu-containing cDNAs that might 
correspond to otherwise rare mRNAs, but also, 
and most significant, because it minimizes the 
likelihood that a rare member of a gene family 
might be excluded from the final (normalized 
or subtracted) library as a result of a cross-hy- 



bridization with a more prevalent but diverged 
sequence. 

The use of normalized libraries for large-scale 
gene discovery/EST programs is beneficial be- 
cause it minimizes redundancies while increasing 
the representation of the rarer cDNAs by about 
threefold, on average. However, given the great 
extent of overlap in gene expression among dif- 
ferent tissues, the use of normalized libraries 
alone is not sufficient to maintain a desirable 
pace of identification of novel sequences at ad- 
vanced stages of such programs. For this reason, • 
we propose that the use of subtracted libraries 
enriched for clones not yet identified might be- 
come increasingly advantageous. Toward this 
goal, we have developed a subtractive hybridiza- 
tion approach designed specifically for this pur- 
pose (see Fig. 4). In a pilot experiment, we were 
able to reduce significantly the representation of 
-5000 1NFLS-IMAGE Consortium clones from 
the 1NFLS library itself (see Fig. 5). With the de- 
velopment of appropriate clustering algorithms, 
the use of nonredundant sets of cDNA/gene se- 
quences as drivers for hybridizations to generate 
subtractive libraries enriched for novel sequences 
should soon become possible, and hopefully will 
facilitate the isolation of all human and mouse 
cDNAs still awaiting identification. 

METHODS 

Construction of Directionally Cloned cDNA 
Libraries 

Poly(A) + RNA was purified from total cellular RNA (except 
for senescent fibroblasts from which cytoplasmic RNA was 
isolated) using the Oligotex mRNA kit (Qiagen) according 
to the manufacturer's instructions, except that two rounds 
of purification were performed. cDNA library construction 
was essentially as described before (Adams et al. 1993b; 
Soares 1994). Typically, 1 n-g poly(A) + RNA was annealed at 
37°C with a twofold mass excess of a NotI-tag-(dT) l8 
primer [or Pad-tag-(dT) 18 in the case of the liver/spleen 
library] and reverse transcribed at 37°C with Superscript 
Reverse Transcriptase (Life Technologies). Alternatively 
poly(A) + RNA was annealed at 45°C with a fourfold mass 
excess of a NofI-tag-(dT) 2S primer and reverse transcribed 
at 45°C. The tag is a sequence of 2-6 nucleotides that is 
unique for each library and thus serves as an identifier (see 
Table 1). With the exception of infant brain, fetal liver/ 
spleen and term placenta, all other first-strand cDNA 
syntheses were primed with the following oligonucleo- 
tide: TGlTACCAATCTGAAGTGGGAGCGGCCGC-tag- 
(dT) 18 or 2S . The oligonucleotide AACTGGAAGAATTCGC- 
GGCCGCAGGAA(dT)j 8 (Pharmacia) was used to prime 
both infant brain and term placenta first-strand cDNA syn- 
theses. The oligonucleotide AACTGGAAGAATTAATTAAA- 
GATCT(dT) 18 was used to prime the synthesis of first- 
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strand fetal liver/spleen cDNA. Double-stranded cDNAs 
were size-selected by gel filtration over a long (64-cm) and 
narrow (0.2-cm diameter) Bio-Gel A-50m (Bio-Rad, 100- 
200 mesh) column, and ligated to a 500- to 1000-fold mo- 
lar excess of adapters. Infant brain cDNAs were ligated to 
/////dill adapters, digested with Not\, size selected over a 
second Bio-Gel column, and cloned directionaily into the 
Not\ and Hmdlll sites of the Lafmid BA vector (Soares et al. 
1994). Fetal liver/spleen cDNAs were ligated to EcoKl 
adapters (Pharmacia), size-selected as above, digested with 
Pad and cloned directionaily into the Pad and EcoRl sites 
of the pT7T3-Pflr vector. All other cDNAs were ligated to 
EcoRl adapters (Pharmacia), size-selected as above, digested 
with Not\ and cloned directionaily into the Not\ and EcoRl 
sites of the pT7T3-/\ir vector. pY7T3-Pac is essentially the 
same as pT7T318D (Pharmacia) with a modified 
polylinker. Figure 6 shows the sequence of the pT7T3-ft/c 
polylinker and flanking sequences. 



Production of Purified Covalently Closed 
Single-stranded Library DNA in Vitro 

Double-stranded phagemid DNA was converted to single- 
stranded circles by the combined action of Gene II (phage 
Fl endonuclease) and Escherichia coii Exonuclease III en- 
zymes, as per the manufacturer's instructions (Life Tech- 
nologies; cat. no. 10356-020). The resulting single- 
stranded circular DNA was purified from the remaining 
double-stranded plasmids by HAP chromatography (Bio- 
Rad) as described previously (Soares et al. 1994). The rep- 
lication initiator protein of bacteriophage f 1 (Gene II) is a 
site-specific endonuclease that binds to the fl origin in 
phagemid vectors and nicks the viral strand of the super- 
coiled DNA . The nicked strand is then digested from its 3' 
end with Exonuclease III (Hoheisel 1993) to generate 
single-stranded circles. Purification of the resulting single- 
stranded circles over HAP is necessary because the conver- 
sion of supercoiled to relaxed plasmids by Gene II is never 
complete. The Gene II reaction was performed for 1 hr at 
30°C and contained typically 4 jig supercoiled plasmid li- 
brary DNA, 1 m.1 Gene II (Life Technologies), and 2 \x\ 
10 x Gene II buffer (Life Technologies) in a total volume of 
20 pA. The Gene II protein was heat inactivated for 5 min 
at 65°C; the reaction mixture was chilled on ice; 2 \l\ Exo- 
nuclease III (Life Technologies, Cat. No. 18013-011, 65 
units/ ^1) was added; and the reaction was incubated for 30 
min at 37°C Gene II and Exonuclease III were then di- 
gested with Proteinase K (Boehringer Mannheim) for 15 
min at 50°C in a 100-p.l reaction containing 10 mM Tris 



^^caccccaggcmacactttatgcttccggctcgtatgttgtgtggaatt^ gagcggataacaatttcacacaRRa aacagctatg 

M13 Reverse Sequencing Primer 

acatgattocgaatt taatacRactcactatagggaa tt tGGCCCTCGAGGCCAAGAATTC CCGACTAC^ 
T7 Promoter Sfd EcoKl Snahl 

GTCGG GGATCC GTC TTMTrAAGCGGCCGCAAGCmttccc^ 

BamHl Pad NofI HmdIII T3 Promoter 

tggccgtcgtmacaacgtgjteac^ 

Ml 3 Sequencing Primer 



Figure 6 Sequence of the pT7T3-Poc polylinker (uppercase) and flanking 
sequences (lowercase). 



(pH 7.8), 5 mM ethylenediamine tetraacetic acid (EDTA), 
0.5% SDS, and 136 ^g Proteinase K. After extraction with 
equal volume of phenol-chloroform-isoamyl alcohol (25: 
24:1), library DNA was ethanol-precipitated and digested 
with Pvull for 2 hr at 37°C. This was done to convert the 
remaining supercoiled plasmids into linear DNA mol- 
ecules and thereby improve their bindability to HAP under 
our conditions. Note that Pvull does not cleave single- 
stranded circles and that there are two Pvull sites in the 
vector. The reaction was diluted with 2 ml loading buffer 
[0.12 M sodium phosphate buffer (pH6.8), 10 mM EDTA, 
and 1% SDS] and purified by HAP chromatography at 
60°C, using a column pre-equilibrated with the same 
buffer (1-ml bed vol.; 0.4 g of HAP). After a 6-ml wash with 
loading buffer, this volume was combined with the flow- 
through fraction, and the sample was extracted twice with 
water-saturated 2-butanol, once with dry 2-butanol, and 
once with water-saturated ether (3 vols, per extraction). 
Residual ether was blown off by vacuum and the sample 
was desalted by passage through a Nensorb column (Du- 
Pont/NEN) according to the manufacturer's specifications, 
concentrated down to -0.35 ml and ethanol-precipitated. 
Note that Gene II-Exonuclease III prepared single- 
stranded DNA is in the opposite polarity to single-stranded 
DNA generated by in vivo phagemid production. 



Production of Purified Covalently Closed 
Single-stranded Library DNA in Vivo 

Plasmid DNA from the starting library was electroporated 
into E. coli DHSaF' bacteria, and the culture was grown 
under ampicillin selection at 37°C to an OD^ of 0.2, su- 
perinfected with a 10- to 20-fold excess of the helper phage 
M13K07 (Pharmacia), and harvested after 4 hr for prepa- 
ration of single-stranded plasmids, as described (Vieira and 
Messing 1987). 



Conversion of Single-stranded Circles to 
Double-stranded Plasmids 

Single-stranded circles (<50 ng) were ethanol-precipitated 
and resuspended in 11 |xl water. Then 4 jjl! 5x Sequenase 
buffer (USB) and 1 \l\ primer (1 ^g) were added and the 
mixture was incubated at 65°C for 5 min and then at 37°C 
for 3 min. Then 1 \iA Sequenase version 2.0 (USB), 1 jxl 0.1 
m dithiothreitol (DTT), and 2 jjlI mixed dNTP stock (a so- 
lution containing each deoxynucleotide at a final concen- 
tration of 10 mM) were added, and the reaction was incu- 
bated at 37°C for 30 min. The total 
volume was taken up to 100 p.1 with 
10 mM Tris (pH8.0) and 1 mM EDTA 
(TE) and the reaction was extracted 
once with phenol-chloroform- 
isoamyl alcohol (25:24:1). Plasmid 
DNA was ethanol-precipitated and 
dissolved in 3 \i\ TE. The following 
oligonucleotides were used for this 
primer extension reaction: (1) M13 
Reverse Sequencing Primer (5'- 
AGCGGATAACAATTTCACA- 
CAGGA-3'), which is complementary 
to single-stranded prepared in vitro, 
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and (2) Oligo-Amp (5'-GACTGGTG AGTACTCAAC- 
CAAGTC-3'), which is complementary to the ampicillin re- 
sistance gene of single-stranded pT7T3-Pac or Lafmid BA 
plasmids prepared in vivo. 



In Vitro Synthesis of Library RNA 

Some 2-5 p.g of double-stranded plasmid DNA from either 
the starting library (see methods 2-1 and 2-2 below) or the 
mini-library of abundant cDNAs (see method 2-3 below) 
was linearized with either Pad (NEB) or Not\ (NEB) and 
used as a template for synthesis of RNA with RiboMax 
Large Scale RNA Production Systems T7 or T3 (Promega), 
according to the manufacturer's instructions. After treat- 
ment with ribonuclease-free DNAse I (Promega), to digest 
away the plasmid DNA template, the RNA was used for 
hybridization as described below. It should be noted that 
RNA synthesized with T7 RNA Polymerase is in the mes- 
sage-like orientation and is complementary to the single- 
stranded circles produced in vitro. On the other hand, 
RNA synthesized with T3 RNA Polymerase is in the an- 
timessage orientation and it is complementary to single- 
stranded circles produced in vivo. 



Normalization Method I 

The procedure used for construction of the normalized 
human infant brain (1NIB) library (here designated as 
method 1) has been described previously (Soares et al. 
1994). Method 1, with minor modifications, was also ap- 
plied to construct the normalized human fetal liver/spleen 
cDNA library (1NFLS). To synthesize a partial second 
strand of about 200 nt by limited extension, a 100 u.1 re- 
action mixture containing 5 pA 0.5 p.g/p.1 PvuII-digested, 
HAP- and gel-purified single-stranded plasmid DNA from 
the fetal liver/spleen starting library, 7 |xl 10 ng/jil oligo 
(dT) 12 _ 18 (Pharmacia), 10 \l\ 10 x Primer Extension Buffer 
[0.3 M Tris (pH 7.5), 0.5 M NaCl, and 0.15 M MgClJ, 10 \l\ 
0.1 m DTT, 10 \jA mixed dNTP stock, 25 jil mixed ddNTP 
stock (a solution containing each dideoxy A, C, and G at a 
final concentration of 25 mM), 5 \l\ 800 Ci/mmole 
[a- 32 P]dCTP, and 20.5 p.1 water was incubated at 60°C for 5 
min, at 50°C for 15 min, and at 37°C for 2 min. Then 7.5 
ixl 5 units/p.1 Klenow enzyme (USB) was added, and the 
reaction was incubated at 37°C for 30 min. The reaction 
was extracted with phenol-chloroform-isoamyl alcohol 
(25:24:1), 5 jig melted and sheared salmon sperm DNA was 
added, and the partially double-stranded plasmids were 
purified from the remaining single-stranded circles (un- 
primed molecules, as well as clones derived from mRNAs 
with an internal Pad site that therefore do not contain an 
oligo(dA) tail at the 3' end) by HAP chromatography. The 
HAP-bound fraction containing the partially double- 
stranded plasmids was eluted with 6 ml 0.4 m sodium 
phosphate buffer (pH 6.8), 10 m m EDTA, and 1% SDS, and 
plasmid DNA was desalted as described before (Soares et al. 
1994) and ethanol-precipitated. The DNA (173 ng) -was 
resuspended in 2.5 p.1 deionized formamide and melted at 
80°C for 3 min under 10 jtl mineral oil. Then 1 p.1 of 5 
Hg/jil oligo(dT) 12 _ 18 (used to block the tails) was added, 
and the mixture was heated at 80°C for 1 min. Then 0.5 yA 
5 m NaCl, 0.5 p.1 10 x TE, and 6.5 p.1 water were added, and 
the reassociation reaction was incubated at 42°C for 0.6 hr 



(calculated C () t = 0.5). The remaining single-stranded 
circles were purified over HAP (flow-through fraction) and 
subjected subsequently to a second cycle of the normal- 
ization procedure as described above, except that reasso- 
ciation was conducted for 24 hr (calculated C„t = 20). The 
remaining single-stranded circles (normalized library; 
1NFLS) were purified over HAP, converted to double- 
stranded plasmids, electroporated into DH10B bacteria, 
and propagated under ampicillin selection. 

Normalization Methods 2-1, 2-2, and 2-3 

Method 2 is a reassociation kinetics-based approach in- 
volving hybridization of in vitro synthesized RNA (the 
driver) derived either from the entire library (methods 2-1 
and 2-2; see Fig. 2) or from a mini-library enriched for 
abundant cDNAs (method 2-3; see Fig. 2), with the whole 
starting library in the form of single-stranded circles (the 
tracer). The remaining single-stranded circles (normalized 
library) are purified by HAP chromatography (HAP flow- 
through fraction), converted to double-stranded plasmids 
for improvement of electroporation efficiency, electropor- 
ated into DH10B bacteria (Life Technologies), and propa- 
gated under ampicillin selection. A number of normalized 
cDNA libraries were constructed with these methods using 
single-stranded plasmids prepared both in vivo and in 
vitro (see Table 1). In all three variants, the driver was first 
pre-annealed with a pair of oligonucleotides to block both 
5' and 3' vector sequences as follows: 0.5 pi (10 p.g) of each 
oligonucleotide, 1 p.1 RNA (5.0 pg in methods 2-1 and 2-3; 
0.5 p.g in method 2-2), and 4.0 *il deionized formamide 
were heated for 3 min at 80°C under 10 pi mineral oil and 
quickly chilled on ice. Then 0.8 p.1 10 x hybridization 
buffer [0.4 m Pipes (pH 6.4), 4 m NaCl, and 10 mM EDTA in 
methods 2-1 and 2-3; 0.4 m Pipes (pH 6.4), 1.2 m NaCl, 10 
mM EDTA, and 1% SDS in method 2-2), 0.5 *jl1 RNAsin 
(Boehringer Mannnheim), and 0.7 p.1 water were added 
and the mixture (total volume, 8 pi) was incubated over- 
night at 42°C (methods 2-1 and 2-3) or 30°C (method 2-2). 
In another tube, 2.5 p.1 (50 ng) single-stranded library DNA 
in deionized formamide was heated for 3 min at 80°C un- 
der mineral oil; 0.5 pA 10 x hybridization buffer and 2.0 p.1 
water were added; and the mixture was transferred to the 
tube containing the preannealed RNA. Hybridization (13- 
p.1 reaction) was performed at 42°C (method 2-1: C 0 t = 
5-10; method 2-3: C 0 t = 100-200) or at 30°C (method 2-2: 
C 0 t = 5-10). The driver, rather than the tracer, was blocked 
because otherwise the latter would, to some extent, bind to 
HAP during purification. The plasmid mini-library en- 
riched for abundant cDNAs that served as a template for 
the synthesis of RNA used as driver in method 
2-3 was prepared from the HAP-bound fraction obtained 
during purification of the normalized library in method 
2-1. Different pairs of blocking oligonucleotides were used, 
depending on whether the RNA was synthesized with 
T3 or T7 RNA polymerases. To block RNA synthesized with 
T3 RNA polymerase, which was used in hybridizations 
with single-stranded plasmids prepared in vivo we used: 
5'- 19 AGGGCGGCCGCAAGCTTATTCCCTTTAGT- 
GAGGGTTAAT-3' (this oligonucleotide was used to block 
5' vector sequences of all but the human fetal liver/spleen 
library RNA), and 5'- 19 AGATCTTTAATTAAGCGGCCG- 
CA AGCnTATTCCCTTTAGTG AGGGTTA AT-3 ' (this oligo- 
nucleotide was used to block 5' vector sequences of the 
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human fetal liver/spleen library RNA), and 5'-AGG- 
C C A A G A AT I'C G G C A CG AG - 3 ' (this oligonucleotide was 
used to block 3' vector sequences). To block RNA synthe- 
sized with T7 RNA polymerase, which was used in hybrid- 
izations with single-stranded plasm ids prepared in vitro 
we used: S'-CCTCGTGCCGAATTCTTGGCCTCGAG- 
GGCCAAATTCCC-3' (this, oligonucleotide was used to 
block 5' vector sequences). The oligonucleotide used to 
prime the synthesis of first-strand cDNA was also used to 
block 3' vector sequences. 



Normalization Method 3 

Method 3, used to generate the normalized library from 
multiple sclerosis plaques (2NbHMSP), is a reassociation- 
kinetics-based approach involving hybridization (C 0 t = 
20-25) of a 20-fold excess of Exonuclease Ill-digested 
cDNA inserts excised from a plasmid DNA preparation of 
the starting library with the library itself in the form of 
single-stranded circles, followed by HAP-purification of 
the remaining single-stranded plasmids, conversion to 
double-strands, and electroporation into bacteria. Some 5 
(xg double-stranded plasmid DNA from the starting library 
was doubly digested with Notl and EcoR\; the excised 
cDNA inserts were separated from the cloning vector by 
agarose gel electrophoresis; and the DNA was purified us- 
ing beta-agarase (NEB) according to the manufacturer's in- 
structions. Then 0.6 (xg gel-purified double-stranded cDNA 
inserts in 47.5 jjlI TE was digested with Exonuclease III at 
37°C for 30 min in a 60-jxl reaction containing 6 |xl 
10 x Exonuclease III buffer [0.5 m Tris (pH 8.0) and 50 mM 
MgCl 2 )], 0,6 ^1 0.1 m DTT, 2.9 p.1 water, and 3 p.1 of 65 
units/fxl Exonuclease III (Life Technologies). The Exo- 
nuclease was then digested with 136 p.g Proteinase K (Boe- 
hringer Mannheim) at 50°C for 15 min in a 100-jjlI reaction 
containing 10 mM Tris (pH 7.8), 5 mM EDTA, and 0.5% 
SDS. After two extractions with phenol-chloroform- 
isoamyl alcohol (25:24:1), the resulting noncomplemen- 
tary single-stranded DNA (total amount -0.3 \Lg) was etha- 
nol-precipitated and resuspended in 1 yA TE. A 5-jxI hy- 
bridization reaction was then set up as follows: 1 p.1 
Exonuclease Ill-digested cDNA inserts (an estimated 
amount of 150 ng of single-stranded DNA) and 50 ng 
single-stranded plasmid DNA from the starting multiple 
sclerosis plaques library (prepared in vitro) in 2.5 \l\ deion- 
ized formamide were mixed and heated at 80°C for 3 min 
under 10 p.1 mineral oil. Then 0.5 pi (10 fig) of a blocking 
oligonucleotide (5'-CCTCGTGCCGAATTCTTGGCCTC- 
GAGGGCCAAATTCCCTATAGTGAGTCGTATTA-3'), 0.5 
p,I 5 m NaCI, and 0.5 p.1 10 x TE were added, and the mix- 
ture was incubated at 42°C for 41 hr (calculated C 0 t of 23). 
The remaining single-stranded plasmids were purified by 
HAP chromatography, converted to double-stranded plas- 
mids, and electroporated into DH10B bacteria (Life Tech- 
nologies) as described above. 



Normalization Method 4 

This is a reassociation-kinetics-based approach involving 
hybridization of a 20-fold excess of cDNA inserts generated 
by PCR with the library itself in the form of single-stranded 
circles, followed by HAP purification of the remaining 
single-stranded plasmids, conversion to double-strands, 



electroporation into DH10B bacteria, and amplification 
under ampicillin selection. PCR amplification of cDNA in- 
serts was performed using the Expand High Fidelity PCR 
System (Boehringer Mannheim) according to the manu- 
facturer's instructions. This PCR system is composed of an 
enzyme mixture containing thermostable Taq DNA and 
Pwo DNA polymerases (Barnes 1994). An amount of 1 ^1 
(2.5-5.0 ng) DNA template [double-stranded plasmids (fe- 
tal lung, parathyroid adenoma, senescent fibroblasts) or 
single-stranded circles prepared in vitro (fetal heart, 
14Nb2HI : LS20W-fetal liver/spleen, and all mouse, rat, and 
schistosome libraries listed in Table 1)| was mixed with 2 
pi dNTP stock (the final concentration of each dNTP in the 
reaction is 200 pM), 5 pi of a 20-pM solution of T7 Primer 
(5'-TAATACGACTCACTATAGGG-3') ( 5 (xl of a 20-pM so- 
lution of T3 Primer (5'-A'ITAACCCTCACTAAAGGGA-3'), 
10 pi 10 x I Expand High Fidelity buffer, 0.75 jxl Expand 
High Fidelity enzyme mix (2.6 units), and 76.25 p.1 water. 
Then 50 pi mineral oil was added and the reaction mixture 
was subjected to the following amplification cycle condi- 
tions in a Perkin Elmer Thermocycler: 7 min while ramp- 
ing up from room temperature to 94°C; 20 cycles of 1 min 
at 94°C, 2 min at 55°C, and 3 min at 72°C, and 7 min at 
72°C. PCR-amplified fragments were purified using the 
High Pure PCR Product Purification Kit (Boehringer Man- 
nheim) as instructed by the manufacturer. The purified 
PCR product was ethanol-precipitated and dissolved in 5 
pi TE. Then 1.5 p.1 (0.5 ^g) PCR products was mixed with 
5 fxl (50 ng) library DNA (single-stranded circles prepared 
in vitro) in deionized formamide, 0.5 p.1 (10 p.g) 5' block- 
ing oligo AV-1 (5'-CCTCGTGCCGAATTCTTGGCCTC- 
GAGGGCCAAATTCCCTATAGTGAGTCGTATTA-3'), 0.5 
pi (10 ^g) 3' blocking oligo AR (5'-ATTAACCCTCAC- 
TAAAGGGAATAAGCTTGCGGCCGCT 20 -3'; used for all 
but the fetal liver/spleen library), or alternatively, (0.5 p.1 
(10 |xg) 3' blocking oligo AV-2 (5'-ATTAACCCTCAC- 
TAAAGGGAATAAGCTTGCGGCCGCTTAATTAAA- 
GATCT l9 -3'; used only for the fetal liver/spleen library), 
and this mixture was heated at 80°C for 3 min under 10 *il 
of mineral oil. Then 1 pi lOx buffer-A [1.2 M NaCI, 0.1 m 
Tris (pH 8.0), and 50 mM EDTA; used for fetal lung, fetal 
heart, parathyroid adenoma, senescent fibroblasts, and 
19.5-days postconception (dpc) mouse embryo] or, alter- 
natively, 1 jjlI 10 X buffer-B [1.2 m NaCI, 0.1 M Tris (pH 8.0), 
50 mM EDTA, and 10% SDS; used for 14Nb2HFLS20W-fetal 
liver/spleen, 17.5-dpc mouse embryo, 13.5- to 14.5-dpc 
mouse embryo, rat heart, rat kidney, and 8-week schisto- 
some], and 1.5 p.1 water were added, and the hybridization 
was performed at 30°C for 24 hr (calculated C 0 t - 5). The 
remaining single-stranded circles were purified by HAP 
chromatography, converted to double-strands, and elec- 
troporated into DH10B (Life Technologies) bacteria, as de- 
scribed above. 



Subtractive Hybridization 

Double-stranded plasmid DNA from a pool of 4992 clones 
grown individually in 384 well plates (IMAGE Consortium 
plates LLAM 78-90, identification nos. 66696-67079 and 
108168-112775) derived from the normalized fetal liver/ 
spleen library (1NFLS) was prepared using the Qiagen 
Midi-prep kit according to the manufacturer's instruc- 
tions, and converted to single-stranded circles in vitro, as 
described above. Single-stranded circles were purified by 
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HAP chromatography and used as a template for PCR am- 
plification with 17 and T3 primers, as described above. An 
amount of 1.5 jxg of PCR-amplified cDNA inserts from the 
LLAM 78-90 pool (in 4 |xl deionized formamide) was 
mixed with 50 ng of single-stranded circles from the 
1NFLS library (in 2 nl deionized formamide), 2.1 ^1 (42 p.g) 
5' blocking oligo AV-1, and 2.1 p.1 (42 p-g) 3' blocking oligo 
AV-2. Then 10 |xl mineral oil was added, and the mixture 
was heated at 80°C for 3 min. Then 1.2 p.1 10 x buffer-B 
and 0.6 jil water were added, and the hybridization was 
performed at 30°C for 48 hr (calculated C„t = 27), The re- 
maining single-stranded circles were purified over HAP, 
converted to double-strands, electroporated into DH10B 
bacteria, and propagated under ampicillin selection to 
generate the subtracted liver/spleen library (1NFLS-S1). 
HAP-bound DNA was also processed and purified for use in 
control experiments. 
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